Efficient Distributed Top-k Query Processing with Caching

نویسندگان

Norvald H. Ryeng

Akrivi Vlachou

Christos Doulkeridis

Kjetil Nørvåg

چکیده

Recently, there has been an increased interest in incorporating in database management systems rank-aware query operators, such as top-k queries, that allow users to retrieve only the most interesting data objects. In this paper, we propose a cache-based approach for efficiently supporting top-k queries in distributed database management systems. In large distributed systems, the query performance depends mainly on the network cost, measured as the number of tuples transmitted over the network. Ideally, only the k tuples that belong to the query result set should be transmitted. Nevertheless, a server cannot decide based only on its local data which tuples belong to the result set. Therefore, in this paper, we use caching of previous results to reduce the number of tuples that must be fetched over the network. To this end, our approach always delivers as many tuples as possible from cache and constructs a remainder query to fetch the remaining tuples. This is different from the existing distributed approaches that need to re-execute the entire top-k query when the cached entries are not sufficient to provide the result set. We demonstrate the feasibility and efficiency of our approach through implementation in a distributed database management system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient top-k processing in large-scaled distributed environments

The rapid development of networking technologies has made it possible to construct a distributed database that involves a huge number of sites. Query processing in such a large-scaled system poses serious challenges beyond the scope of traditional distributed algorithms. In this paper, we propose a new algorithm BRANCA for performing top-k retrieval in these environments. Integrating two orthog...

متن کامل

Top-k Semantic Caching

The subject of this thesis is the intelligent caching of top-k queries in an environment with high latency and low throughput. In such an environment, caching can be used to reduce network traffic and improve response time. Slow database connections of mobile devices and to databases, which have been offshored, are practical use cases. A semantic cache is a query-based cache that caches query r...

متن کامل

Query-Driven Indexing in Large-Scale Distributed Systems

Efficient and effective search in large-scale data repositories requires complex indexing solutions deployed on a large number of servers. Web search engines such as Google and Yahoo! already rely upon complex systems to be able to return relevant query results and keep processing times within the comfortable sub-second limit. Nevertheless, the exponential growth of the amount of content on the...

متن کامل

Efficient Top-k Query Processing Algorithms in Highly Distributed Environments

Efficient top-k query processing in highly distributed environments is a valuable but challenging research topic. This paper focuses on the problem over vertically partitioned data and aims to propose more efficient algorithms.. The effort is put on limiting the data transferred and communication round trips among nodes to reduce the communication cost of the query processing. Two novel algorit...

متن کامل

Efficient query processing and index tuning using proximity scores

In the presence of growing data, the need for efficient query processing under result quality and index size control becomes more and more a challenge to search engines. We show how to use proximity scores to make query processing effective and efficient with focus on either of the optimization goals. More precisely, we make the following contributions: • We present a comprehensive comparative ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Efficient Distributed Top-k Query Processing with Caching

نویسندگان

چکیده

منابع مشابه

Efficient top-k processing in large-scaled distributed environments

Top-k Semantic Caching

Query-Driven Indexing in Large-Scale Distributed Systems

Efficient Top-k Query Processing Algorithms in Highly Distributed Environments

Efficient query processing and index tuning using proximity scores

عنوان ژورنال:

اشتراک گذاری